Removing Unwanted Variation from High Dimensional Data with Negative Controls
نویسندگان
چکیده
High dimensional data suffer from unwanted variation, such as the batch effects common in microarray data. Unwanted variation complicates the analysis of high dimensional data, leading to high rates of false discoveries, high rates of missed discoveries, or both. In many cases the factors causing the unwanted variation are unknown and must be inferred from the data. In such cases, negative controls may be used to identify the unwanted variation and separate it from the wanted variation. We present a new method, RUV-4, to adjust for unwanted variation in high dimensional data with negative controls. RUV-4 may be used when the goal of the analysis is to determine which of the features are truly associated with a given factor of interest. One nice property of RUV-4 is that it is relatively insensitive to the number of unwanted factors included in the model; this makes estimating the number of factors less critical. We also present a novel method for estimating the features’ variances that may be used even when a large number of unwanted factors are included in the model and the design matrix is full rank. We name this the “inverse method for estimating variances.” By combining RUV-4 with the inverse method, it is no longer necessary to estimate the number of unwanted factors at all. Using both real and simulated data we compare the performance of RUV-4 with that of other adjustment methods such as SVA, LEAPP, ICE, and RUV-2. We find that RUV-4 and its variants perform as well or better than other methods.
منابع مشابه
RLE plots: Visualizing unwanted variation in high dimensional data
Unwanted variation can be highly problematic and so its detection is often crucial. Relative log expression (RLE) plots are a powerful tool for visualizing such variation in high dimensional data. We provide a detailed examination of these plots, with the aid of examples and simulation, explaining what they are and what they can reveal. RLE plots are particularly useful for assessing whether a ...
متن کاملFeature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملGender study gene expression data from Vawter et al . [ 2004 ]
Vawter et al. [2004] studied differences in gene expression between male and female patients. This gender study is an interesting benchmark for methods aiming at removing unwanted variation as it expected to be affected by several technical and biological factors: two microarray platforms, three different labs, three tissue localizations in the brain. Most of the 10 patients involved in the stu...
متن کاملRemoving unwanted variation in a differential methylation analysis of Illumina HumanMethylation450 array data
Due to their relatively low-cost per sample and broad, gene-centric coverage of CpGs across the human genome, Illumina's 450k arrays are widely used in large scale differential methylation studies. However, by their very nature, large studies are particularly susceptible to the effects of unwanted variation. The effects of unwanted variation have been extensively documented in gene expression a...
متن کاملUsing control genes to correct for unwanted variation in microarray data.
Microarray expression studies suffer from the problem of batch effects and other unwanted variation. Many methods have been proposed to adjust microarray data to mitigate the problems of unwanted variation. Several of these methods rely on factor analysis to infer the unwanted variation from the data. A central problem with this approach is the difficulty in discerning the unwanted variation fr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013